Convert execd to use pcmk__request_t #3915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

nrwahl2 merged 20 commits into ClusterLabs:main from clumens:execd-request_t

Jul 31, 2025

Contributor

clumens commented Jul 14, 2025

We have a long-standing project to convert all daemons to use the pcmk__request_t type. Once this is done, we can start reducing code duplication across the daemons and start doing a bunch more cleanups (see projects related to T126). This PR addresses execd.

I've done this a little bit weird in order to make it easier to develop. Since there's a bunch of different IPC messages in execd and it's easier to develop and debug if you only convert them one at a time, but we also have a default handler, I've added a commit titled "XXX DON'T PUSH THIS" and then a later commit that reverts it. These two get rid of the default handler, allowing the old code to still be used as a fallback for messages I haven't converted yet.

Obviously, I would get rid of these before pushing this PR for real. But, doing that will also make it difficult to bisect through this PR should we ever need to. So, I'm open to suggestions.

I haven't given this the thorough testing it requires yet due to the problem I mentioned in slack where sometimes resources don't start, but in less comprehensive testing, this has been working fine.

nrwahl2 reviewed

View reviewed changes

lib/common/messages.c Outdated Show resolved Hide resolved

clumens force-pushed the execd-request_t branch 4 times, most recently from 1b3051b to 6b0ff95 Compare

July 15, 2025 20:54

nrwahl2 reviewed

View reviewed changes

Contributor

nrwahl2 left a comment

Reviewed through CRM_OP_REGISTER commit. May or may not get further tonight.

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated

-                  if (!c->name) {
-                      const char *value = crm_element_value(msg,
-                                                            PCMK__XA_LRMD_CLIENTNAME);

Contributor

nrwahl2 Jul 16, 2025

I don't see why the attribute would ever be missing, so I
think it's safe to get rid of the lengthier approach that grabs the
client PID as a last option.

I don't think c->name gets set anywhere outside of this function and lrmd_remote_client_msg().

PCMK__XA_LRMD_CLIENTNAME gets set when an API client connects. (And it strangely gets re-set to the value of c->name here and in lrmd_remote_client_msg(), which seems redundant.)

lrmd_t:connect()
-> lrmd_handshake()
    -> lrmd_handshake_hello_msg()

But the lrmd API is public, and the name argument to lrmd_t:connect() can be NULL.

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

clumens force-pushed the execd-request_t branch from 6b0ff95 to 9ebafef Compare

July 16, 2025 18:25

nrwahl2 reviewed

View reviewed changes

Contributor

nrwahl2 left a comment

checks clock Good night.

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c

+              {
+                  int call_id = 0;
+                  int rc = pcmk_rc_ok;
+                  bool allowed = pcmk_is_set(request->ipc_client->flags,

Contributor

nrwahl2 Jul 17, 2025

It might eventually be nice to do something like we have with cib__op_attr_privileged, to reduce some duplication in the handlers. Maybe also a flag for needing call ID, then based on that, getting it and CRM_CHECK-ing that we succeeded; or something. Not worth the trouble right now.

See also enum crm_rsc_flags.

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_messages.c Show resolved Hide resolved

daemons/execd/execd_messages.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

clumens added the review: in progress label

clumens force-pushed the execd-request_t branch from 9ebafef to 375c8ff Compare

July 18, 2025 19:08

Contributor Author

clumens commented Jul 18, 2025

I've addressed most review comments, but there's still the ack/nack and response return code stuff to straighten out.

nrwahl2 reviewed

View reviewed changes

daemons/execd/execd_commands.c Outdated Show resolved Hide resolved

nrwahl2 reviewed

View reviewed changes

Contributor

nrwahl2 left a comment

The update looks good to me, aside from the handful of comments that I just left (e.g., CRM_LOG_ASSERT()). I should've made them into a review instead of single comments for ease of finding, oh well...

As you mentioned, there are still things to consider before merging. I'm pretty much good with the parts that have been addressed though.

clumens added 17 commits

July 21, 2025 14:23


          Refactor: daemons: Move IPC server skeleton code to execd_messages.c.

3ee3cf3

This is the beginnings of using the pcmk__request_t interface in the
server side of pacemaker-execd.  For now, there's no substantial code
changes - just moving the code into a new file.


          Refactor: daemons: Rename variables in execd_messages.c.

836ae0d

This brings them more in line with usage in pe_ipc_dispatch (I'm hoping
to condense all this code one day), and frees up the name "request" to
be used by pcmk__request_t in the future.


          Refactor: daemons: Add execd_process_message.

a3bb87a

At the moment, this function is fairly unfortunately named (there's
already a process_lrmd_message).  However, the new function is gradually
going to be taking code from the old function until the old one is
completely gone.  So it's just a temporary confusion we'll need to live
with for a bit.

This new function holds the framework for handling commands with
pcmk__request_t instead of the old system.  It's a little tricky to
convert these commands over one at a time and still keep everything
working, which is why there's the big TODO comment.


          Refactor: daemons: Condense duplicated IPC code.

9f83b81

Both places that call execd_process_message first add several additional
attributes to the request XML, so it makes sense to move that code into
execd_process_message instead.

Note that there are two different approaches to setting client->name if
it's missing.  I don't see why the attribute would ever be missing, so I
think it's safe to get rid of the lengthier approach that grabs the
client PID as a last option.


          Refactor: daemons: Use pcmk__request_t for CRM_OP_REGISTER.

78e8997

This commit is typical of how the commit for each command is going to
work:

* Expose a function in execd_commands.c to the rest of the exec daemon
  and give it a name that is consistent with "public" functions.

* Have it return a standard pacemaker return code.

* Remove the old handler block from process_lrmd_message (this is the
  older function, not the newer confusingly named function).

* Add a new handler function and add that function to the handler table.


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_RSC_INFO.

2e4be16


          Refactor: daemons: Expose two more functions to all of execd.

a21c989

...and rename them to make them consistent.  These are needed to handle
processing certain message types.


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_RSC_REG.

397804a

Handling this message type means we also need to send out a generic
notify message after sending the reply.  For the moment, the rc value
being passed to execd_send_generic_notify doesn't matter because
execd_process_rsc_register doesn't return anything (previously, it only
ever returned pcmk_ok).


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_RSC_EXEC.

694061d

Additionally, execd_process_rsc_exec should return a standard pacemaker
return code.  It doesn't do anything useful with the call_id value it
extracts and returns, and the caller already knows the call_id (and
doesn't do anything with the return value from execd_process_rsc_exec
aside from check if it's an error or not).


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_RSC_CANCEL.

cd1ba02

This additionally changes a couple functions to return a standard
Pacemaker return code.


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_ALERT_EXEC.

0f1ef52


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_POKE.

9dcef97


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_GET_RECURRING.

f49fe3a


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_CHECK.

63f0603


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_RSC_UNREG.

10d2a05

This requires a little extra work, in addition to the steps of exposing
a function and changing its return type to be a standard Pacemaker
return code.

Sending a generic notify for unregistering a resource also requires
inspecting the return value, which we luckily have in the reply XML so
we can just grab it from there as needed.


          Refactor: daemons: Use pcmk__request_t for LRMD_OP_IPC_FWD.

c59059b


          Refactor: daemons: Add handle_unknown_request.

7d56d46

Now that all messages are supported by the new code, we can add the
unknown message handler to catch anything else.  And that means
everything is being handled by execd_process_message now, so
process_lrmd_message can be removed.

Fixes T126

clumens added 3 commits

July 21, 2025 14:42


          Low: daemons: Correct result error string on unknown IPC messages.

ba47d0b


          Refactor: daemons: Add an execd_create_reply macro...

5dc6e6c

and rename the existing function to be execd_create_reply_as.  This gets
rid of the need to pass __func__ as the first argument every time.


          Low: daemons: Return CRM_EX_PROTOCOL when ACKing an unknown request.

34d7619

Only a couple daemons send an ACK on an unknown request.  Those that do
send CRM_EX_INVALID_PARAM, which doesn't quite seem like the right
value.  I guess I can see what it's going for, but CRM_EX_PROTOCOL seems
like a better fit.

clumens force-pushed the execd-request_t branch from 375c8ff to 821786b Compare

July 22, 2025 13:25

nrwahl2 reviewed

View reviewed changes

Contributor

nrwahl2 left a comment

This all looks good to me, except see ACK/NACK comment.

daemons/attrd/attrd_ipc.c Outdated

                   if (xml == NULL) {
                       crm_debug("Unrecognizable IPC data from PID %d", pcmk__client_pid(c));
-                      pcmk__ipc_send_ack(client, id, flags, PCMK__XE_ACK, NULL,
+                      pcmk__ipc_send_ack(client, id, flags, PCMK__XE_NACK, NULL,

Contributor

nrwahl2 Jul 31, 2025

I think, but am not remotely sure, that some clients may need to be updated -- either to return early, or to properly handle a NACK in some other way.

Previously, the only things that sent NACK were the CIB manager (for invalid message), the fencer, and the executor (in remote_proxy_cb()).

No client looks for NACK, but they do look for ACK. The following clients' IPC dispatch functions return immediately if they receive an ACK (but not a NACK):

attrd
controld
pacemakerd
schedulerd

I'm starting to think it'd be simpler to drop PCMK__XE_NACK and drop the tag argument of pcmk__ipc_send_ack(), and instead send PCMK__XE_ACK unconditionally. Clients can use the exit code.

Contributor Author

clumens Jul 31, 2025

Yeah, that's kind of what I was thinking too based on the commit message. I don't remember why we have both ACK vs. NACK and an exit code. One other thing to note here is that not every message gets an ACK, either. I'm also not convinced that any of this matters too terribly much either given that we control both sides of the connection and it takes place on the same system with what I assume will always be the same versions of the pacemaker daemons.

I'd be fine with dropping this patch, merging this PR, and then dealing with the ACK vs. NACK stuff in a later PR.

clumens force-pushed the execd-request_t branch from 821786b to 34d7619 Compare

July 31, 2025 21:12

nrwahl2 approved these changes

View reviewed changes

nrwahl2 merged commit cbc7fe8 into ClusterLabs:main

1 check passed

clumens deleted the execd-request_t branch

August 8, 2025 17:36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review: in progress